Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding GCS dependency for backend and prompt service #1106

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

harini-venkataraman
Copy link
Contributor

What

Adding dependency version needed for Backend and prompt service.
...

Why

This will be needed while the integrator uses fsspec. Corresponding file storage dependency is to be added.
...

How

Pinned the dependency in toml.
...

Can this PR break any existing features. If yes, please list possible items. If no, please explain why. (PS: Admins do not merge the PR without this section filled)

No, these are additions of dependency
...

Database Migrations

Not applicable.
...

Env Config

Not applicable.
...

Relevant Docs

Related Issues or PRs

Not applicable.
...

Dependencies Versions

Not applicable.
...

Notes on Testing

In backend, this is already present as a transitive dependency through connectors.
Pinned the version.
...

Screenshots

Checklist

I have read and understood the Contribution Guidelines.

@@ -33,6 +33,7 @@ dependencies = [
"social-auth-app-django==5.3.0", # For OAuth
"social-auth-core==4.4.2", # For OAuth
"unstract-sdk~=0.56.0rc4",
"gcsfs==2024.6.0",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@harini-venkataraman won't we need the dependencies for Azure and S3 if this is needed? @gaya3-zipstack @hari-kuriakose

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gaya3-zipstack I got a doubt. Shouldn't we be adding this in the SDK?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ritwik-g Adding it in SDK is a baggage for SDK as we are only using fsspec APIs. The integrator can decide what filesystem is required and cann the dependencies accordingly.
Eg, tools need minio and not GCS. Hence adding GCS in SDK will be unnecessary for tools.
However we could use a selective way to install SDK like explicitly mention what we need to be added. We could take it up later....

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay but let's make sure to add the Azure and S3 dependencies as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s3 and azure is not used in backend yet, and we can add it when we make the integration.

Copy link
Contributor

@ritwik-g ritwik-g Jan 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@harini-venkataraman @gaya3-zipstack we will be requiring it as soon as we release it to production for on-prem customers. This is not a separate requirement. S3, GCS and Azure storage support for all 3 needs to be present. Testing wise we can test it later. So my suggestion would be to take care of this so that the effort to make it on-prem ready will be minimal. Where ever we need google storage we will need the S3 and Azure as well there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any way this is what I feel. If you think it is better to proceed without the same please go ahead.

@@ -20,11 +17,6 @@ def __init__(self, log_level: LogLevel = LogLevel.INFO, org_id: str = "") -> Non
self.log_level = log_level
self.org_id = org_id
self.workflow_filestorage = None
if check_feature_flag_status(FeatureFlag.REMOTE_FILE_STORAGE):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@harini-venkataraman Wondering if we should initialise workflow_filestorage to GCS here. This will save us from doing checks like provider==gcs in the SDK PR.
This is based on the assumption that PromptIdeBase tool is only used in the context of PromptStudio and no other tool
I think if we rename workflow_filestorage to a more generic name like filestorage, it would look much cleaner. But we can do that later.

Copy link
Contributor

@hari-kuriakose hari-kuriakose left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@harini-venkataraman If gcsfs and s3fs etc are required only for code in SDK, then it will not be good for the long term to include them as direct dependencies downstream.

However, keeping packages for all cloud providers as default dependencies in SDK will also lead to bloating.

Thus I suggest that we leverage optional dependencies in SDK pyproject.toml to have a section like the following:

[project.optional-dependencies]
aws = [
  "s3fs~=2024.10.0"
]
azure = [
]
gcp = [
  "gcsfs~=2024.6"
]

This will allow downstream services to install SDK selectively by having dependencies like unstract-sdk, unstract-sdk[gcp], etc.

NOTE: It is relatively OK to release with only GCP support first.

cc @ritwik-g @gaya3-zipstack @muhammad-ali-e

Copy link
Contributor

github-actions bot commented Feb 4, 2025

filepath function $$\textcolor{#23d18b}{\tt{passed}}$$ SUBTOTAL
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_logs}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_cleanup}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_cleanup\_skip}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_client\_init}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_get\_image\_exists}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_get\_image}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_get\_container\_run\_config}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_get\_container\_run\_config\_without\_mount}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_run\_container}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{TOTAL}}$$ $$\textcolor{#23d18b}{\tt{9}}$$ $$\textcolor{#23d18b}{\tt{9}}$$

Copy link

sonarqubecloud bot commented Feb 4, 2025

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants